ARM
ARM¶
- Reduced Instruction Set Computing (RISC)
- Less than 100 Instructions
- Instructions only operate on Registers
- ONLY Load/Store instructions can access memory.
- Instructions can be used for Continual Execution
- ARMv3 and earlier use little-endian format for data
- ARMv4 and later use Big-endian format by default but allows for switchable endian-ness for data
- Uses little-endian format for Instructions
| ARM Family | ARM Architecture |
|---|---|
| ARM7 | ARM v4 |
| ARM9 | ARM v5 |
| ARM11 | ARM v6 |
| Cortex-A | ARM v7-A |
| Cortex-R | ARM v7-R |
| Cortex-M | ARM v7-M |
ARM Mode:
- R15 Program Counter is always 4 bytes
Writing Assembly¶
Use as to transform ASM file to object file
Use ld to link object files to binary
as program.s -o program.o
ld program.o -o program
.string is null terminated
.ascii in not null terminated
Instructions¶
| Instruction | Description |
|---|---|
| MOV | Move data |
| EOR | Bitwise XOR |
| MVN | Move and negate |
| LDR | Load |
| ADD | Addition |
| STR | Store |
| SUB | Subtraction |
| LDM | Load Multiple |
| MUL | Multiplication |
| STM | Store Multiple |
| LSL | Logical Shift Left |
| PUSH | Push on Stack |
| LSR | Logical Shift Right |
| POP | Pop off Stack |
| ASR | Arithmetic Shift Right |
| B | Branch |
| ROR | Rotate Right |
| BL | Branch with Link |
| CMP | Compare |
| BX | Branch and eXchange |
| AND | Bitwise AND |
| BLX | Branch with Link and eXchange |
| ORR | Bitwise OR |
| SWI/SVC | System Call |
Barrel Shifter can be used to shrink multiple instructions into one.
Rx, ASR n: Register x with arithmetic shift right by n bits (1 = n = 32)Rx, LSL n: Register x with logical shift left by n bits (0 = n = 31)Rx, LSR n: Register x with logical shift right by n bits (1 = n = 32)Rx, ROR n: Register x with rotate right by n bits (1 = n = 31)Rx, RRX: Register x with rotate right by one bit, with extend
Examples:
ADD R0, R1, R2 // R1 + R2 -> R0
ADD R0, R1, #2 // R1 + 2 -> R0
LDR R2, [R0] // Use the address in R0 and load the data at the address into R2.
LDR R1, [PC, #12] // Use the address in PC where the offset of the address is 12 and load the data at the address into R1.
STR R2, [R1] // Store the value of R2 in to the address denoted by R1
STR r2, [r1, #4]! // R1 + 4 -> R1
// Store the varable in R2 in the new address in R1 with the offset of the address is 4.
LDR r3, [r1], #4 // Load the value at memory address found in R1 to register R3.
// R1 + 4 -> R1
STR r2, [r1, r2, LSL#2] // Store the value in R2 to the memory address in R1 with the offset R2 left-shifted by 2.
STR r2, [r1, r2, LSL#2]! // R1 + R2<<2 -> R1
// Store the value in R2 to the new memory address found in R1.
LDR r3, [r1], r2, LSL#2 // Load value at memory address found in R1 to the register R3.
// R1 + R2<<2 -> R1
MOVLE R0, #5 // If LE (Less Than or Equal) is set 5 -> R0
MOV R0, R1, LSL #1 // Store left shifted R1 -> R0
adr r0, words+12 /* address of words[3] -> r0 */
ldr r1, array_buff_bridge /* address of array_buff[0] -> r1 */
ldr r2, array_buff_bridge+4 /* address of array_buff[2] -> r2 */
ldm r0, {r4,r5} /* words[3] -> r4 = 0x03; words[4] -> r5 = 0x04 */
stm r1, {r4,r5} /* r4 -> array_buff[0] = 0x03; r5 -> array_buff[1] = 0x04 */
ldmia r0, {r4-r6} /* words[3] -> r4 = 0x03, words[4] -> r5 = 0x04; words[5] -> r6 = 0x05; */
stmia r1, {r4-r6} /* r4 -> array_buff[0] = 0x03; r5 -> array_buff[1] = 0x04; r6 -> array_buff[2] = 0x05 */
ldmib r0, {r4-r6} /* words[4] -> r4 = 0x04; words[5] -> r5 = 0x05; words[6] -> r6 = 0x06 */
stmib r1, {r4-r6} /* r4 -> array_buff[1] = 0x04; r5 -> array_buff[2] = 0x05; r6 -> array_buff[3] = 0x06 */
ldmda r0, {r4-r6} /* words[3] -> r6 = 0x03; words[2] -> r5 = 0x02; words[1] -> r4 = 0x01 */
ldmdb r0, {r4-r6} /* words[2] -> r6 = 0x02; words[1] -> r5 = 0x01; words[0] -> r4 = 0x00 */
stmda r2, {r4-r6} /* r6 -> array_buff[2] = 0x02; r5 -> array_buff[1] = 0x01; r4 -> array_buff[0] = 0x00 */
stmdb r2, {r4-r5} /* r5 -> array_buff[1] = 0x01; r4 -> array_buff[0] = 0x00; */
push {r0, r1}
pop {r2, r3}
stmdb sp!, {r0, r1}
ldmia sp!, {r4, r5}
Intermediate Values in ARM¶
Using any Intermediate value in arm can only be represented in 8bits with a bit shift throughout the 32bit.
MOV R0, #255 //Valid b1111111 << 0
MOV R0, #960 //Valid (0x3C0) = 0b00001111 << 6 = 0b1111000000
MOV R0, #961 //Invalid (0x3C1) = 0b1111000001
Data Types¶
- Signed data: Smaller Range of Numbers but can have negative
- Unsigned data: Large Range including zero
ldr: Load Wordldrh: Load unsigned Half Wordldrsh: Load signed Half Wordldrb: Load unsigned Byteldrsb: Load signed Bytes
str: Store Wordstrh: Store unsigned Half Wordstrsh: Store signed Half Wordstrb: Store unsigned Bytestrsb: Store signed Byte
Registers¶
- 30 General Purpose 32-bit Registers
- First 16 (R0-R15 General Purpose Registers) are accessible in User-Level Mode
- R7 (Holds Syscall Number)
- R11 (Base Frame Pointer) Points to the bottom of the stack
- R12 (Intra Procedural Call)
- R13 (Stack Pointer) Controls the Pointer to the top of the stack where the top element of the stack is.
- R14 (Link Register) Used to store the Return address
- R15 (Program Counter)
- When a Branch/Jump is executed holds the destination address
- Otherwise holds two arm instructions after the Current instruction (Older Arm processors fetched instructions two ahead and is kept to insure compatibility)
- Control Program Status Register (CPSR)
- Bit 0-4: (Processor/Privilege Mode)
- Bit 5: (Thumb) 1 when in Thumb
- Bit 6: (FIQ disable)
- Bit 7: (IRQ disable)
- Bit 8: (Abort disable)
- Bit 9: (Endian-ness) 0 for little-endian 1 for big-endian
- Bit 10-15: ???
- Bit 16-19: ???
- Bit 24: (Jazelle bit) Allows some ARM processors to execute Java bytecode in hardware.
- Bit 25-26: ???
- Bit 27: (Underflow)
- Bit 28: (Overflow) Set when the result of an add, subtract, or compare is greater than or equal to 231, or less than 2^31.
- Bit 29: (Carry)
- Set when result of an addition is greater than or equal to 2^32
- Set when result of a subtraction is positive or zero
- Set when an inline barrel shifter operation in a move or logical instruction.
- Bit 30: (Zero) 1 when result is zero
- Bit 31: (Negative) 1 when result is negative
Example:
mov r0, #2
mov r1, #4
cmp r1, r0 // 4-2 Carry flag is set
cmp r0, r1 // 2-4 Negative flag is set
Conditionals¶
These conditionals below can be added to the end of any ARM instruction and will only execute when the flag is in the correct state.
| Condition Code | Meaning (for cmp or subs) | Status of Flags | ||
|---|---|---|---|---|
| EQ | Equal | Z==1 | ||
| NE | Not Equal | Z==0 | ||
| GT | Signed Greater Than | (Z==0) && (N==V) | ||
| LT | Signed Less Than | N!=V | ||
| GE | Signed Greater Than or Equal | N==V | ||
| LE | Signed Less Than or Equal | (Z==1) \ | \ | (N!=V) |
| CS or HS | Unsigned Higher or Same (or Carry Set) | C==1 | ||
| CC or LO | Unsigned Lower (or Carry Clear) | C==0 | ||
| MI | Negative (or Minus) | N==1 | ||
| PL | Positive (or Plus) | N==0 | ||
| AL | Always executed | – | ||
| NV | Never executed | – | ||
| VS | Signed Overflow | V==1 | ||
| VC | No signed Overflow | V==0 | ||
| HI | Unsigned Higher | (C==1) && (Z==0) | ||
| LS | Unsigned Lower or same | (C==0) \ | \ | (Z==0) |
Example:
.global main
main:
mov r0, #2 # r0 = 2
cmp r0, #3 # r0 == 3 If false set Negative bit
addlt r0, r0, #1 # If the less than bit is set then r0 = r0 + 1
cmp r0, #3 # r0 == 3 If false set Zero bit and reset Negative bit
addlt r0, r0, #1 # If the less than bit is set then r0 = r0 + 1
bx lr # Branch to the lr register
IF-THEN-(Else) Conditional Instruction¶
This is a simple switch instruction for assembly
IT: refers to If-Then (If TRUE then execute the next instruction)ITT: refers to If-Then-Then (If TRUE then execute the next 2 instructions)ITE: refers to If-Then-Else (If TRUE then execute the next instruction, If FALSE skip the next instruction and execute the one after that)ITTE: refers to If-Then-Then-Else (If TRUE then execute the next 2 instructions and skip the next one, If FALSE skip 2 instructions and execute the one after that)ITTEE: refers to If-Then-Then-Else-Else (If TRUE then execute the next 2 instructions and skip the next 2 instructions after that, If FALSE skip 2 instructions and execute the two after that)
Example:
ITTE NE ; Next 3 instructions are conditional
ANDNE R0, R0, R1 ; ANDNE does not update condition flags
ADDSNE R2, R2, #1 ; ADDSNE updates condition flags
MOVEQ R2, R3 ; Conditional move Where EQ is the Inverse of NE
ITE GT ; Next 2 instructions are conditional
ADDGT R1, R0, #55 ; Conditional addition in case the GT is true
ADDLE R1, R0, #48 ; Conditional addition in case the GT is not true
ITTEE EQ ; Next 4 instructions are conditional
MOVEQ R0, R1 ; Conditional MOV
ADDEQ R2, R2, #10 ; Conditional ADD
ANDNE R3, R3, #1 ; Conditional AND
BNE.W dloop ; Branch instruction can only be used in the last instruction of an IT block
Branching¶
Branch (B): Simple jump to a function
Branch link (BL): Saves the program counter (PC+4) in LR register and jumps to function
Branch exchange (BX): Simple jump to a function but switch instruction set (ARM <-> Thumb)
Branch link exchange (BLX): Saves the program counter (PC+4) in specified register and jumps to function
Switch THUMB Mode:
.text
.global _start
_start:
.code 32 @ ARM mode
add r2, pc, #1 @ put PC+1 into R2
bx r2 @ branch + exchange to R2
.code 16 @ Thumb mode
mov r0, #1
Conditional Branch Example:
.text
.global _start
_start:
mov r0, #2 # r0 = 2
mov r1, #2 # r1 = 2
add r0, r0, r1 # r0 = r0 + r1
cmp r0, #4 # if r0 = 4
beq func1 # if r0 = 4 jump to func1
add r1, #5 # Else r1 = r1 + 5
b func2 # jump to func2
func1:
mov r1, r0 # r1 = r0
bx lr # jump to the address in lr
func2:
mov r0, r1 # r0 = r1
bx lr # jump to the address in lr
Stack¶
Stack can be Grow up or down.
If the stack grows up it is a descending Stack.
If the stack grows down it is a ascending Stack.
If the stack points to an object then its a full stack
If the stack points to an null before the stack starts then its an empty stack.
| Stack Type | Store Instruction | Load Instruction |
|---|---|---|
| Full descending | STMFD (STMDB, Decrement Before) | LDMFD (LDM, Increment after) |
| Full ascending | STMFA (STMIB, Increment Before) | LDMFA (LDMDA, Decrement After) |
| Empty descending | STMED (STMDA, Decrement After) | LDMED (LDMIB, Increment Before) |
| Empty ascending | STMEA (STM, Increment after) | LDMEA (LDMDB, Decrement Before) |
Thumb Mode¶
Thumb-1:
- 16 bit Instructions
- R15 Program Counter is always 2 bytes
- Used in ARMv6 and earlier
Thumb-2:
- Extends Thumb-1
- 16 bit or 32 bit Instructions
- 32bit instructions have a .w added to the instruction
- Used in ARMv6T2, ARMv7
- R15 Program Counter is always 2 bytes
- Conditional Execution using the IT instruction
ThumbEE:
- code compiled on the device either shortly before or during execution.
Switching state¶
Switching to Thumb mode:
1. Use the BX (Branch Exchange) or the BLX (Branch Link and Exchange) and set the least significant bit destination register to 1.
- This does not cause alignment issues because the processor will ignore the last bit.
2. We know that we are in Thumb mode if the T bit in the current program status register is set.
Emulating ARM with Unicorn¶
from __future__ import print_function
from ctypes import sizeof
from unicorn import *
from unicorn.arm_const import *
from unicorn.unicorn_const import *
from capstone import *
import struct, binascii
#callback of the code hook
def hook_code(uc, addr, size, user_data):
mem = uc.mem_read(addr, size)
disas_single(bytes(mem),addr)
#disassembly each instruction and print the mnemonic name
def disas_single(data,addr):
for i in capmd.disasm(data,addr):
print(f"0x{i.address:x}:\t{i.mnemonic}\t{i.op_str}" % ())
break
next_free_block = 0x0
def map_memory(unicorn_obj, map_data, align_size=(1024 * 1024), default_perm=UC_PROT_ALL ):
for memory_loc, data_info in map_data.items():
#Set size if not set
if data_info.get('size') == None:
data_info["size"] = ((len(data_info["data"]) // align_size) + 1 ) * align_size
#Set Permissions if not set
if data_info.get('permissions') == None:
data_info["permissions"] = default_perm
#Check Memory map location
if memory_loc < next_free_block:
memory_loc = next_free_block
#Map the memory to the unicorn obj
unicorn_obj.mem_map(memory_loc, data_info["size"], perms=data_info["permissions"])
#Write the memory
unicorn_obj.mem_write(ADDRESS, data_info["data"])
#Update the next possible write location
next_free_block = memory_loc + data_info["size"]
def get_address(map_data, tag_name):
for memory_loc, data_info in map_data.items():
if data_info["tag"] == tag_name:
return memory_loc
#create a new instance of capstone
capmd = Cs(UC_ARCH_ARM, UC_MODE_ARM)
#code to be emulated
in_file = open("u-boot.bin", "rb") # opening for [r]eading as [b]inary
ARM_CODE32 = in_file.read()
in_file.close()
# file to be decrypted
in_file = open("kernel.img.raw", "rb") # opening for [r]eading as [b]inary
FILE_TOBE_DEC = in_file.read()
in_file.close()
print("Emulate ARM code")
print("Shielder")
try:
# Initialize emulator in ARM-32bit mode
# with "ARM" ARM instruction set
mu = Uc(UC_ARCH_ARM, UC_MODE_ARM)
#Map Memory from Dictionary
#Uboot | Stack | RAM
mem_map = { 0x80800000: {"tag": "uboot", "data": ARM_CODE32},
0x00000000: {"tag": "stack", "data": b"\x00" * (2 * 1024 * 1024)},
0x00000000: {"tag": "ram", "data": b"\x00" * (8 * 1024 * 1024)}}
map_memory(mu, mem_map)
# initialize machine registries
mu.reg_write(UC_ARM_REG_SP, get_address(mem_map, "stack"))
# first argument, memory pointer to the location of the file
mu.reg_write(UC_ARM_REG_R0, get_address(mem_map, "ram"))
# second argument, memory pointer to the location on which write the file
mu.reg_write(UC_ARM_REG_R1, get_address(mem_map, "ram"))
# third argument, block size to be read from memory pointed by r0
mu.reg_write(UC_ARM_REG_R2, 512)
# hook any instruction and disassembly them with capstone
mu.hook_add(UC_HOOK_CODE, hook_code)
# emulate code in infinite time
# Address + start/end of the block_aes_decrypt function
# this trick save much headaches
mu.emu_start(get_address(mem_map, "uboot")+0x8c40, get_address(mem_map, "uboot")+0x8c44)
# now print out some registers
print("Emulation done. Below is the CPU context")
r_r0 = mu.reg_read(UC_ARM_REG_R0)
r_r1 = mu.reg_read(UC_ARM_REG_R1)
r_r2 = mu.reg_read(UC_ARM_REG_R2)
r_pc = mu.reg_read(UC_ARM_REG_PC)
print(f">>> r0 = 0x{r_r0:x}")
print(f">>> r1 = 0x{r_r1:x}")
print(f">>> r2 = 0x{r_r2:x}")
print(f">>> pc = 0x{r_pc:x}")
print("\nReading data from first 512byte of the RAM at: " + hex(get_address(mem_map, "ram")))
print("==== BEGIN ====")
ram_data = mu.mem_read(get_address(mem_map, "ram"), 512)
print(str(binascii.hexlify(ram_data)))
print("==== END ====")
# from the reversed binary, we know which are the magic bytes
# at the beginning of the kernel
if b"27051956" == binascii.hexlify(bytearray(ram_data[:4])):
print("\nMagic Bytes match :)\n\n")
with open("test.bin", "wb") as f:
f.write(ram_data)
except UcError as e:
print("ERROR: %s" % e)